The Perceptron Algorithm with Uneven Margins
نویسندگان
چکیده
The perceptron algorithm with margins is a simple, fast and effective learning algorithm for linear classifiers; it produces decision hyperplanes within some constant ratio of the maximal margin. In this paper we study this algorithm and a new variant: the perceptron algorithm with uneven margins, tailored for document categorisation problems (i.e. problems where classes are highly unbalanced and performance depends on the ranking of patterns). We discuss the interest of these algorithms from a theoretical point of view, provide a generalisation of Novikoff’s theorem for uneven margins, give a geometrically description of these algorithms and show experimentally that both algorithms yield equal or better performances than support vector machines, while reducing training time and sparsity, in classification (USPS) and document categorisation (Reuters) problems.
منابع مشابه
Using Uneven Margins SVM and Perceptron for Information Extraction
The classification problem derived from information extraction (IE) has an imbalanced training set. This is particularly true when learning from smaller datasets which often have a few positive training examples and many negative ones. This paper takes two popular IE algorithms – SVM and Perceptron – and demonstrates how the introduction of an uneven margins parameter can improve the results on...
متن کاملFlexible Margin Selection for Reranking with Full Pairwise Samples
Perceptron like large margin algorithms are introduced for the experiments with various margin selections. Compared to the previous perceptron reranking algorithms, the new algorithms use full pairwise samples and allow us to search for margins in a larger space. Our experimental results on the data set of (Collins, 2000) show that a perceptron like ordinal regression algorithm with uneven marg...
متن کاملPerceptron Learning for Chinese Word Segmentation
We explored a simple, fast and effective learning algorithm, the uneven margins Perceptron, for Chinese word segmentation. We adopted the character-based classification framework and transformed the task into several binary classification problems. We participated the close and open tests for all the four corpora. For the open test we only used the utf-8 code knowledge for discrimination among ...
متن کاملMicrosoft Cambridge at TREC 2002: Filtering Track
Six runs were submitted for the Adaptive Filtering track, four on the adaptive filtering task (ok11af??), and two on the routing task (msPUM?). The adaptive filtering system has been somewhat modified from the one used for TREC–10, largely for efficiency and flexibility reasons; the basic filtering algorithms remain similar to those used in recent TRECs. For the routing task, a completely new s...
متن کاملThe SVM With Uneven Margins And Chinese Document Categorisation
We propose and study a new variant of the SVM — the SVM with uneven margins, tailored for document categorisation problems (i.e. problems where classes are highly unbalanced). Our experiments showed that the new algorithm significantly outperformed the SVM with respect to the document categorisation for small categories. Furthermore, we report the results of the SVM as well as our new algorithm...
متن کامل